Metastatistics of Extreme Values and 
its Application in Hydrology 

Massimiliano Ignaccolo 

Dept. of Earth & Ocean Sciences, Nicholas School of the Environment, Duke University, 

Durham, North Carolina 

Marco Marani 

Dept. of Earth & Ocean Sciences, Nicholas School of the Environment, Duke University, 

Durham, North Carolina 
DICE A and International Center for Hydrology, University of Padova, Padova, Italy 

November 14, 2012 

Abstract 

We present a novel statistical treatment, the "metastatistics of ex- 
treme events", for calculating the frequency of extreme events. This 
approach, which is of general validity, is the proper statistical frame- 
work to address the problem of data with statistical inhomogeneities. 
By use of artificial sequences, we show that the metastatistics pro- 
duce the correct predictions while the traditional approach based on 
the generalized extreme value distribution does not. An application of 
the metastatistics methodology to the case of extreme event to rainfall 
daily precipitation is also presented . 

1 Introduction 

The importance of predicting the frequency of rainfall extremes is paramount 
in the design of any major hydraulic structure for water resources manage- 
ment and flood control. The established statistical tools used in climate 
analyses and in the engineering practice moved away from the initial concept 
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of probable maximum precipitation [TJ towards an approach which defines de- 
sign events based on a specified probability of occurrence. The key concept 
used in this setting is the return period, T r , i.e. the average time interval be- 
tween two exceedances of the magnitude of the event considered, es h(T r , r) 
(r being the time scale of interest. We will focus on the illustrative case 
of r = 1 day in the present paper. The estimation of the event magnitude 
associated with a specified return period (the main design specification) is 
usually obtained by fitting an 'appropriate' extreme value distribution (e.g. 
[2],|3]). These distributions (EV1, EV2, and EV3, or the Generalized Extreme 
Value, GEV, distributions summarizing them, [U El El El IE] ) have become 
the common reference for extreme value analyses because their form can be 
derived theoretically by means of an asymptotic theory assuming the number 
of extreme events (i.e. larger than the magnitude of the event of interest) in 
any given year to be large. 

In practice, estimates of extreme rainfall are made by extracting the an- 
nual maxima from the series of precipitated amounts (for the time scale of 
interest), and by fitting the annual maxima series with a Gumbel distribution 
(EV1) to extrapolate the rainfall amount, h(T r , r), associated with the fixed 
return time. The Gumbel distribution is typically used in practice because it 
is the asymptotic distribution for rainfall maxima provided the distribution 
of rainfall (daily in this case) amounts (say, x) does not exhibit a slowly 
decaying tail (i.e. it decays faster than x~ a ,a > 1). The Lognormal [H] and 
Gamma JTU] distributions have been used to fit daily amounts of rainfall. 
However, the Authors of pj] provide a theoretical framework and exhaustive 
empirical evidence that the probability of exceedance, ^f(h), of daily rainfall 
amounts ( with h > 10 mm) is well fitted by a stretched exponential function, 
\I/(/i) oc exp(— h/C) w , C, w > 0. Using a statistics jargon we say that the 
distribution of daily amounts of rainfall is right-tail equivalent to a Weibull 
distribution. 

The absence of inverse power law tails in the distribution of daily amounts 
of rainfall has made the Gumbel distribution (EV1) the asymptotic distribu- 
tion adopted by the hydrological community to fit rainfall extremes. However 
several Authors report that the Gumbel distribution underestimates the ex- 
treme rainfall amounts (e.g. [121 Q21 HI])- The inadequacy of the Gumbel 
distribution according to [2] is due to two factors. (A) A slow rate of conver- 
gence to the asymptotic distribution. This is so because number of days with 
non null precipitation in one year is bound to be smaller than 365 while the 
extreme asymptotic theory is valid in the limit when maxima are extracted 
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from a large, ideally infinite, number of samples. This problem particularly 
affects maxima extracted from a Weibull distribution: see [T5J [IB] for a detail 
discussion on the rate of convergence. (B) Inhomogeneity of the precipitation 
time series. For the extreme value theorem to apply is necessary the stability 
of the distribution of the variates from which the maxima are extracted. This 
may not be the case in many practical application: e.g. the functional form of 
the distribution may be fixed but the parameters describing the distribution 
may be themselves a stochastic variable. 

In[2j|3] Koutsoyannis argues that when the above mentioned factors, (A) 
and (B), play a role, rainfall extremes should be fitted to the Frechet (EV2) 
distribution ([!]), which has a slower decaying tail than the Gumbel distri- 
bution. In this manuscript we argue against this choice. A slow rate of 
convergence to the asymptotic distribution, factor (A), does not justify the 
use of the asymptotic form relative to another basin of attraction (from EV1 
to EV2). Instead one should use the "penultimate" approximation, the ap- 
proximation prior to the "ultimate" asymptotic expression, to fit the maxima 
series. In the case of variates of the exponential type an analytical expression 
for the penultimate approximation can be derived [T51 HSj • A variate is said to 
be of the exponential type if \l/(/i)=exp —g(h), where ^(h) is the probability 
of exceeding the threshold h, and g(h) is a positive function which increases 
monotonically faster than ln(/i). This is the case of Weibull variates so that 
we can apply it to extreme daily amounts of rainfall. The adoption the EV2 
asymptote lacks justification also in the case of inhomogeneity, factor (B). 
To address this problem we need to operate in a fashion similar to the case 
of inhomogeneous Poisson processes. As the rate itself of the Poisson process 
is a stochastic variable, the calculation of any variables of interest includes 
an integration over all possible values of the rate. The same procedure needs 
to be adopted in the case of inhomogeneous extreme events. We dub this 
approach as the Metastatistics Extreme Value (MEV) one. The MEV ap- 
proach applied to the penultimate approximation is proper tool to examine 
the occurrences of extremes in series of daily amounts of precipitation. We 
accomplish this by use generated sequences of 50 maxima generated from 
mixtures (variable scale and shape parameter) of Weibull variates. These 
sequence are used to evaluate the intensity of daily precipitation relative to 
return times up to 1,000 years. The MEV approach yields the correct results 
while the one based on the EV2 asymptote results in a systematic overes- 
timation. Moreover we apply the MEV approach to the historic (from 1725 
to 2006 albeit not continuously) time series of daily precipitation amounts 
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collected at Padova, Italy. 

The manuscript is organized as follows. In Section [2] we describe the data 
set used for this analysis. In Section [3] we briefly summarize the classical 
extreme event theory, the precondition practice, and introduce the metas- 
tatistics extreme value (MEV) formula, Our results are exposed in Section 
IU and our conclusion are drawn in Section 

2 Data 

We consider the daily rainfall amount observed at Padova (Italy) over a span 
of almost three centuries. During this period different, albeit structurally 
similar, instruments have been adopted at three different locations, which all 
fall within a 1 Km circle. The dataset is freely downloadable, and we refer 
the reader to [17] for previous analyses of the Padova time series. 

Our data set is composed of three intervals of continuous observations, 
1725-1764, 1768-1814, and 1841-2006, which are later further divided into five 
subintervals (1725-1764, 1768-1807, 1841-1880, 1887-2006, and 1841-1920) to 
explore different inhomogeneity hypotheses. 

3 Methods 

It is first useful to briefly summarize the Extreme Event Theory, as typically 
used in hydrology, then present the practice of "penultimate" approximation 
for Weibull variates, and to introduce the use of a metastatics to estimate 
extreme events associated with an assigned return period. 

3.1 Extreme Value theorem 

Let X be a stochastic variable and p(x) its probability density function, 
F(x) = P(X < x) its distribution function, and ^(x) = 1 — F(x) its com- 
plementary distribution function. We can define a new stochastic variable, 
Y n , as the maximum of n (an integer number) realizations of the stochastic 
variable X: Y n = max{xi,X2, ...,x n }. Y n is the n-sample maximum (n is the 
cardinality, order, of the maximum) of the "parent" stochastic variable X . If 
the events generating the realizations of X are independent, the cumulative 
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distribution, ((y), of Y n may be expressed as 

C(v) = [F(y)] n - (i) 

Upon definition of a renormalized variable SVi = (Y n — b n )/a n , the extreme 
value theorem jU Ej establishes that 

lim P(S n <s)= lim ((s) = lim [F(a n • s + 6 n )] n = H(s) (2) 

n— >oo n— >oo n— >oo 

where a n > and fe n are "renormalization" constants. The function H(s) in 
Eq. 02]) must be one of the three following types (excluding the degenerate 
case, in which all the probability is concentrated in one value of the random 
variable): 

0) v s 

s > and s < } (3) 
s < and 1 s > 

The type of limiting distribution is determined by the property of the 
distribution of the parent variable X [UGH [6]. In particular, 

EV3 u = supF(x) < 1 < +oo and 
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The three asymptotic types, EV1-EV3, can be thought of as special cases of 
a single Generalized Extreme Value distribution (GEV) [Hj: 

COO = H GEV {s) = exp | - ^1 + k S —^j } (5) 

where (.)+ = max(.,0), \i is the location parameter, o > is the scale pa- 
rameter, and A; is a shape parameter. The limit k = corresponds to the 
EV1 distribution, k > to the EV2 distribution (with a = 1/k,) and k < 
to the EV3 distribution (with a = —1/k). The function Hgev{ s ) is usually 
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fitted to the cumulative distribution of non-normalized maxima, so that the 
location parameter \x and the scale parameter a are the renormalization pa- 
rameters b n and a n respectively. However, it is important to note that the 
distribution describing the n-sample maximum will strictly be a GEV only 
for 'large enough' values of n. How large the value of n needs to be should be 
determined by analyzing the convergence properties based on the observed 
realizations of X. 



3.2 Penultimate approximation for Weibull variates 

The expected largest value, U n , in n realizations of the variable x is the one 
that is exceeded with probability 1/n: 

= - <=► U n = ^ (-) = F^ 1 ( 1 - -). (6) 



n \n \ n 



Using this result we can write the cumulative probability ((y) for the n- 
sample maximum Y n as 



C{y) = [F{y)] n = [i - ^{y)V 



(7) 



for y > U n the term ^ \y) / 'ty (U n ) < 1 such that, for n large enough we can 
substitute the Cauchy approximation ((1 — x) a = 1 — ax = exp(— ax) when 
x 1) in Eq.([7]) to obtain: 

Eq. (IH]) is referred to as the "penultimate" approximation [161 HH]: the 
approximation prior to the "ultimate" approximation being given by the 
extreme value theorem Eq.(j2]). The error made in adopting the Cauchy 
approximation depend only on the value of n (cardinality of the maximum) 
and can be quantified calculating the relative error e{U n ) associated with the 
mode U n [IS]- In this case the approximated value is, from Eq. (jSJ), exp(— 1) 
while the exact value is, from Eq. (J7|), (1 — l/n) n . A plot of e(U n ) as a 
function of n is reported in Fig. 1 of [16]: e.g. for n = 50 the corresponding 
relative error is ^(L^O) = 0.01, Note that for values y > U n the relative error 
is smaller than e(U n ) as ^f(y) < 
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We now consider the case of variates of exponential type: = exp(— g(x)) 
where g(h) is a positive function which increases monotonically faster than 
\n(x). In this case from Eq. (JSJ) we obtain 

-H-HC(v))) = Kv)-h(u n ). (9) 

This last equation can be expanded in a Taylor series to obtain 



-ln(-ln(C(y))) 



dh{y) 



dy 



Un (y - Un) + ^ 



(y-Un) 

2! 



2 



(10) 



The extreme value theorem assures that for very large values of n, the linear 
term in Eq. ffTU]) dominates (TBJ [TS] , therefore 



-ln(-MCW)) = dh<y> 



dy 



(y-u n ) (11) 



which is the ultimate approximation. Eq. (flTj) is the Gumbel distribution 
(EV1) with location parameter U n and scale parameter [(dh{y)/dy)\u^\~ x - 
U n and [(dh(y) / dy)^]^ 1 are the renormalization coefficients b n and a n of 
the extreme value theorem, Eq. fl2]). In the case of the Weibull distribution 
h{x) = (x/C) w , thus using Eq.lflOl one obtains the following penultimate 
Taylor series approximation: 

will V u,_1 

-ln(-ln(C(y)))= (y-U n )+ 

w(w - l)(U n r- 2 (y - U n ) 2 (12) 
C w 2! 

For an exponential parent distribution (w — 1) only the linear term of Eq. fll2|) 
is non null (the derivative of h(y) of order >1 are all null). In this case the 
penultimate and the ultimate approximation are equivalent. The conver- 
gence to the Gumbel distribution for the cumulative distribution of maxima 
extracted from an exponential parent is extremely fast: dictated by precision 
of the Cauchy approximation used to obtain Eq.flB])). For values of the shape 
parameter the convergence to the Gumbel distribution is dictated by 

the rate with which the nonlinear terms in Eq. fTl2|) become negligible with 
respect to the linear term as n — > oo. It is well know that this convergence 
might be very slow (even n = 10 6 may not be sufficient for Eq. ffTTj) to be 
"valid") El USES]. 
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In this case, one should use the penultimate approximation of Eq.(TT21) and 
not the GEV distribution of Eq.(j5]) as an accurate (neglecting the error due to 
the Cauchy approximation) expression for the cumulative distribution of the 
n-sample maxima. However, if the shape parameter is not an integer value, 
then Eq. ([12]) has an infinite number of terms and cannot be easily computed. 
To overcome this limitation we use the practice of "preconditioning" [TBI [To] . 
We introduce the new variable z = x w , whose distribution is exponential, 
p(z) = £jj=Texp(—z/C), with C = C w . For the variable z the convergence 
of the cumulative distribution of n-sample maxima to the limiting Gumbel 
distribution is very fast and we can write (using Eqs.flB]) and (ITT]) 

-l n (-ln(C(y))) = l(y-C'lnn), (13) 

and finally 

-ln(-ln(C(y))) = ^(l/" - C w \nn). (14) 

This equation is an exact, neglecting the error due the Cachy approximation, 
expression (when n cannot be considered infinite, which is for all practical 
applications) for the probability ((y) for any value of the shape parameter w. 
Note that all the results obtained in this Section are also valid for varaites 
whose distribution function is right-tail equivalent to a Weibull [TH]. Two 
distribution functions F\ and F 2 are right-tail equivalent if (1 — Fi(x))/(1 — 
^2 0*0) — > 1 when x — > +oo. The results presented in [UJ indicate that 
the distribution function of the daily amount of precipitation is right-tail 
equivalent to the Weibull distribution function. 



3.3 Metastatistics 

The exceedance probability, C(y): °f the n-maximum Y n depends on the car- 
dinality, n, and on the parameters, 9 = (9i,..,6k), of the distribution of 
the parent variable x. To make this dependence explicit we now adopt the 
notation ((y,n, 9 ) instead of C{y)- Let us consider, as an example, the se- 
ries of maxima {Yj} = (Yi, Y" 2 , Y?) each of them with variable cardinality 
{rij} = (rix, ri2, tit) and whose parent variables, while sharing a common 

distribution, have different parameters {9j} = (9i , 6 2 , 8?). We want to 
find the probability ((y) that none of the maxima in the sequence {Y}} has a 
value smaller than y. This example has practical relevance. In fact a typical 
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hydrological application requires the estimate of daily rainfall amount with 
a return period of, say, 1,000 years from an observed time series of T years. 
The number of wet days, days with a non-zero rainfall amount, changes from 
year to year, inducing a different cardinality of yearly maximum. Moreover 
inhomogeneity may be considered to be present, by which the distribution 
of daily rainfall amounts has a constant functional, but changing parameters 
every year. The classical results of Extreme Value Theory are not designed to 
handle this they all postulate a constant cardinality of the n-sample 

maxima and to a homogeneous stochastic process. 

Given the sequence of maxima {Yj} (j = 1,2, ..,T), the probability ((y) 
for a maximum to not exceed the value y is simply 

ay) 7 »i//.» r ^). (15) 

3=1 

We use the term "metastatistics factor" to indicate fin, 6) the probability 
density function of observing a maximum with cardinality n and a parent 
distribution characterized by the parameters 9 . With this definition we can 
write Eq. (1151) in the more general form 

C(y) = J J dnd9 f(n, 7) ((y, n, t), (16) 

where the symbol d6 denotes the differential dQ\dQ2---d&k- Note that n is 
integer variable but we keep for convenience a continuous notation with the 
understanding that the probability density function f(n, 9) is punctual in 
the variable n. We refer to Eq. ffT6]) as the Metastatistics of Extreme Value 
(MEV) formula. Note that Eq. ffT6l) reduces to Eq. fTl5|) for the metastatistics 
factor f(n, 9) — ^ J2j=i <K n ~ n ji ® ~ 

3.3.1 Variable maximum order for Weibull variates with fixed 
scale and shape parameters 

Hereby we consider where maximum values are extracted from a 

Weibull parent distribution with fixed scale C and shape w parameters but 
with a varaible order [n not fixed). This case reflect a situation where the 
probability of the daily amounts of precipitation is homogeneous (invaraint 
from one year to the next) but the number of wet days (daily amount >0) 
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in one yaer is not fixed (as it usually the case). Using Eqs. (jl6p and f|14jl we 
write 



"2 yW 

C(y) = ^2f(n j )exp(-exp(-— +lnn i )) = 

;:; m 

m 

In the above equation {3(y) = exp(— exp(— y w /C w )), while ri\ and rii are 
the minimum and maximum values for the order n (minimum and maxi- 
mum number of wet days), and f(rij) is the frequency with which the order 
rij is present in the maxima series. For y 3> C the terms [f3(y)] nj can be 
approximated as follows 



[/3(y)p =[exp(-exp(-|^))]^ ~ [1 - exp(^)^ 
1 -n ie xp(— ). 



(18) 



When we insert this result in to Eq.( TT7|) we get 



1 (V 
1 -n j exp( — 



1— < n > exp 



exp(-exp(-^(^ + C w \n < n >))), 



(19) 



where < n > indicates the average value of the maximum order. Notice that 
the last term of Eq.f ll9p is identical to Eq. ffT^j) with < n > instead of n. Thus 
for y ^> C the distribution function ((y) of the mixture considered (fixed 
shape and scale parameters for the parent variable but varaible maximum 
order) is equal to the distribution function relative to a fixed order, this 
order being the average of the mixture of orders. The hypothesis that the 
parent variable is a Weibull variate has been essential in deriving this result. 
Therefore it may not be valid for variate which are not of the Weibull type. 
In the case where the shape w and scale C parameter in Eq. (fT4l) are also 
variable, one can repeat the above arguments for y ^> C max {C max being the 
maximum scale in the mixture). If the order of maximum n and the scale and 
shape parameters are independent from each other the metastatistics factor 
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f(n,C,w) can be consider as the product of two factors f (n) and f p (C,w) 
and write 



C(v) 



I 



dCdw f p (C, w) exp(— exp( 



C w 



1 



(y w + C w In <n>))), 



(20) 



4 Results 

We first study the statistics of the daily amount of precipitation for our data 
set. Next, we show that the MEV formula, Eg. (fit)]) , together with Eq. (THl) 
are the correct tool to estimate the distribution function of maxima drawn 
from a mixture of Weibull variates, while the GEV formula, Eq. (j5]) , is not. 
Finally we apply the MEV approach to our data set and draw conclusions 
on its homogeneity and prediction on the structural stability for hydrological 
purposes. 

4.1 Pad ova time series and Weibull approximation 

The Padova time series has three intervals of continuous observation: 1725- 
1764, 1768-1814, and 1841-2006. For each interval we calculate the proba- 
bility ^(h) for the daily amount of rainfall to exceed a threshold h. The 
results are reported in panel (a) of Fig{TJ We see how all three intervals have 
similar complementary distribution functions. In panel (b) of the same figure 
we compare the probability ^f(h) relative to the 1841-2006 interval with the 
probabilities calculated for each year of the interval: the "cloud" of yearly 
curves is approximatively symmetric with respect the curve relative to the 
entire interval. In panels (c) and (d) of FigJT]we display the results of fitting 
the observed probability ^(h) (squares) with a stretched exponential func- 
tion (exp(h/C) w ) for h >10 mm. Two fitting methodologies are adopted: the 
least square fit (solid line) and the maximum likelihood (dashed line). The 
result in panel (c) are relative to the 5 years interval 1841-1845, and those 
in panel (d) to 5 years interval 1926-1930. In most cases the least square 
fit and the maximum likelihood one produce similar results as in the case 
depicted by panel (c). However, due to the left truncation (h >10 mm) the 
algorithm (Matalb 2012) which minimize the likelihood is not always per- 
forming properly as in the case depicted in panel (d): the least square fit is 
a better approximation than the maximum likelihood fit. Moreover the al- 
gorithm maximizing the likelihood fails sometimes to find a maximum when 
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too few data are available. E.g. when one considers only the data from a 
single year or two years, the condition h > 10 mm reduces the number of 
available points to ~5-10. The least square fit, although it is not the most 
proper choice [19], does not suffer from these limitation and therefore we 
adopt in the following to show the validity of the metastatistics formula. 
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Figure 1: The probability $(h) for the daily amount of rainfall to exceed a 
value h. Panel (a): ^(h) for the three time intervals of observation considered 
for the Padova time series (points). Panel (b): ^{h) for the entire 1841-2006 
interval (squares) and for each year of the interval (solid lines). Panels (c) 
and (d): ^(h) (full squares) together with the Weibull fits in the range h > 10 
mm by the least square method (LS) and by maximum likelihood method 
(ML). 

Next we consider the variability of the yearly maximum M yr , number of 
wet days n w (days with a non null precipitated amount) together with that of 
the scale C and shape w parameter of Weibull fitted to the complementary 
distribution function *&(h) given h > 10 mm. The results are reported in 
Fig. [2j In the left panels the areas shadowed in gray indicate four of the five 
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subsets (see Sec. II) considered for separate analysis: 1725-1764, 1768-1807, 
1841-1880, and 1887-2006. Right panels, report the observed frequencies 
corresponding to the quantities depicted in left panels. The top left panel 
depicts the variability of the annual maxima. The distribution has a plateau 
in the region 40-65 mm with a positive skewness. The middle-top panel 
depicts the variability of n w . The mode is ~100 days with a second peak of 
almost equal intensity at ~120 day. 

To address the issue of homogeneity for the daily amount of rainfall dy- 
namics we operate as follows. We first consider the value of the scale C and 
shape w parameters for each of the five subsets considered for our analysis 
(Sec. [2]). Then, each subset is divided in 10,5,2, and 1 year long not over- 
lapping windows. Inside each window the scale and shape parameters are 
calculated. The results of this procedure are reported on the midlle-bottom 
panel (c), scale, and bottom panel (d), shape, of Fig. |2j The blue line refers 
to the result obtained with 10 years moving window, the red line to those 
obtained with 1 year moving window, and the black line to the result for the 
entire subset. For a better visualization the results obtained with 2 years 
and 5 years moving windows are not reported. From a visual inspection of 
these results we can formulate the following hypotheses. The intervals 1725- 
1764, 1768-1807, and 1887-2006 are intervals during which the daily amount 
of precipitation can be considered homogeneous: the variability of the scale 
and shape parameters seems to be quite symmetrical with respect the values 
calculated using the entire interval. This hypothesis is maybe true also for 
1841-1880 interval, while the interval 1841-1920 is one for which the rainfall 
process at a daily scale cannot be considered homogeneous (even if the re- 
sults relative to this interval are explicitly reported the observations relative 
to the 1841-1880 interval and the first 40 year of the 1887-2006 suggest this 
conclusion) . In Section 14.31 we will test more rigorously these hypotheses. 

4.2 MEV distribution vs GEV distribution 

In the following we will compare the metastatistics approach, Eq. (1161) with 
the one based on fitting the maxima series to the generalized extreme value 
distribution, Eq. fl5]). We show that the metastatistics is the correct ap- 
proach in case of inhomogeneity. In particular we show that, in the case of 
maxima extracted from a mixture of parent Weibull variates, the adoption 
of the penultimate approximation, Eq.f lT4|) coupled with the metastatistics 
formulation, Eq.f lTB]) . are the proper tools to address the question of the 
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projected frequency of extreme events. 

For this purpose we consider three experiments using artificially gener- 
ated sequences. Experiment (1) Maxima are extracted with a fixed cardinal- 
ity and from a Weibull parent variable with fixed scale and shape parameter. 
This experiment correspond to consider a homogeneous rain dynamics with 
a fixed amount of wet days for each year. Experiment (2) Maxima extracted 
with a fixed cardinality and from a Weibull parent variable which scale and 
shape parameter changing every 5 maxima extractions. This experiment cor- 
responds to consider a rain dynamics which homogeneous (stable) for 5 years 
after which a new condition is achieved. The number of wet days is fixed. Ex- 
periment (3) Maxima extracted with a fixed cardinality and from a Weibull 
parent variable which scale and shape parameter changing every 2 maxima 
extractions. This experiment is analogous to the previous one except that 
the rain dynamics is stable only for 2 years. To mimic conditions which are 
typically encountered in rainfall time series, we set the number of maxima to 
be 50 (50 years of data) and the cardinality of the maxima to be 100 (100 wet 
days per year). The scale and shape parameters are those obtained from the 
Padova time series adopting the 50 years interval from xxxx to yyyy. Then 
we proceed as follows. For each experiment we generate the corresponding 
sequence of 50 "years" each with 100 "days" of non null precipitation. These 
variates are used to calculate the scale C and shape w parameters which are 
fed in Eq. (j!4p and into Eq. (ll5j) to calculate the MEV estimate of the distri- 
bution funciton ((y), the cumulative distribution of maxima. Moreover for 
each year, we calculate the maximum to obtain a sequence of maxima which 
we fit (using the minimum likelihood method) to the generalized extreme 
value distribution (Eq. (jSJ)) and to the Gumbel distribution to obtain the 
GEV and Gumbel estimates of ((y) respectively. We repeat this procedure 
1,000 times so that for each value y we can calculate the median value of 
((y) of the 1,000 realizations. The median values relative to the MEV, GEV, 
and Gumbel methodology are compared in a Gumbel plot: — ln(— ln(£(y))) 
versus y. To assess which of these three methodologies is the most accurate 
we generate, for each experiment, a sequence 10 7 maxima (this is done simply 
buy parsing together series of 50 maxima generated according the prescrip- 
tion of each experiment) rrij which we sort in ascending order. The sorted 
sequence is used to create the couples of points (— ln(— hx((J — 0.5/10 7 ))), mj) 
which are used as "truth" in the Gumbel plot of — ln(— hi(((y))) versus y. 

The results are show in Fig. [31 Panels (a), (b) and (c) refer respectively 
to experiments (1), (2) and (3). Blue curves indicate the MEV median es- 
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timate, red curves the GEV median estimate, and green curves the Gumbel 
median estimate. Black squares represent the "truth" values. Pink curves in 
panels (b) and (c) are MEV median estimate obtained using average values 
of the scale and shape parameters. We see how in all case the MEV me- 
dian estimate coincides with the expected values (truth). The GEV median 
estimates consistently underestimates the correct probability £(y) (overesti- 
mates the precipitation value associated with a given return period), while 
the Gumbel median estimate consistently overestimates it (underestimate 
the precipitation value associated with a given return period). 

The results of Sec. 13.3. II show that for the artificial sequences adopted in 
the experiments (1), (2), and (3) the influence of the a varaible cardinality 
amount to the adoption (for y ^> C max , C max being the maximum value 
of the scale parameter in the mixture) of the penultimate approximation 
formula, Eq. (ll4p with a cardinality equal to the average cardinality of the 
maxima sample. We verified this prediction running experiments (l)-(3) 
with a variable cardinality. The results are not reported for brevity. 

4.3 The question of homogeneity 

In the previous Section we have demonstrated the superiority of the metas- 
tatistics approach over the generalized extreme value and Gumbel prescrip- 
tions. We are now in a position to address the question of homogeneity in 
the Padova time series. We selected 5 intervals: 1725-1764, 1768-1807, 1841- 
1880, 1841-1920, and 1887-2006. With the help of Fig. El we formulated the 
following hypotheses. During the intervals 1725-1764, 1768-1807 and 1887- 
2006 the sequence of daily amount of rain appears to be homogeneous, during 
the interval 1841-1920 appears to be inhomogeneous, while we are undecided 
regarding the the 1841-1880 interval. 

To check the validity of these hypotheses, we calculate for each of the five 
intervals the scale C and shape parameter w of the stretched exponential 
function fitting the probability ^(h) for the daily amount of rain to exceed 
a threshold h, given h > 10 mm. This is equivalent to consider, at least 
initially, the daily amount of rain in each interval as a homogeneous process. 
Then given an interval, we use the computed scale and shape parameters to 
generate 1,000 artificial homogeneous sequences. Each of these sequences is 
then divided in non overlapping subsets of duration 10,5,2, and 1 year. For 
each subset length we consider the MEV estimate of the probability ((y) 
(the probability for a maximum to not exceed the threshold y) is calculated 
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via Eqs. ( TT4"j) and ( TT5"j) . The results of the 1,000 repetitions are used to 
calculate the 5%, 50%, and 95% percentile of ((y) given y. The rationale 
is that if inside an interval the daily amount of rain is an homogeneous 
process we expect the MEV estimate of ((y) calculated with subsets of m 
(10,5,2, and 1) years to be inside the 5%, 95% percentile range of the MEV 
estimate calculated under the hypothesis of homogeneity (note that since we 
use the penultimate approximation in the MEV formula, a MEV estimate in 
homogeneous condition is different from a GEV estimate). 

Figure @] presents the results of this analysis adopting a Gumbel plot: 
— In ( — In ( C ( .'7 ) ) ) versus y. In panels (a) and (c). the gray shadowed area 
depict the 5%, 95% percentile range calculated via montecarlo simulation 
under the hypothesis of homogeneity, while the black square indicate the 
50% percentile (median). Moreover the colored solid lines represent the MEV 
estimate done with subsets of different lengths of the time interval considered, 
(red for 1 year, green for 2 years, blue for 5 years, pink for 10 years, and 
black for the entire interval). Finally the curves relative to 5, 2, and 1 
years subsets are shifted by 25, 50, 75 mm for clarity. From panel (a) we 
conclude that for the interval 1887-2006 the daily amount of rain can be 
considered a homogeneous stochastic process. In fact for all different subset 
lengths the MEV estimate reside inside the 5%, 95% percentile interval of the 
homogeneous hypothesis. Similar results (not reported here for brevity) hold 
for the intervals 1725-1764, 1768-1807, and 1841-1880. If we consider the 
interval 1840-1920, panel (c), we see that the hypothesis of homogeneity is 
not tenable as the MEV estimate relative to 10 years and 5 years long subsets 
(pink and blue curves) resides clearly outside the 5%, 95 % percentile range 
expected for the homogeneous case. The estimate relative to 2 years long 
subsets (green curve) is at the border of the 5%, 95 % percentile range, while 
the one for 1 year long subset (red) is inside (we will offer an explanation for 
this effect in the following). 

Finally, panel (b) and (d) report only the median value of the estimates 
relative to different subset lengths without any shift. As the length of the 
subset, used for the preconditioned metastatistics methodology, decreases 
(10 years to 1 year) the estimate median probability ((y) decreases in value 
for any fixed y (the daily amounts of rain relative to a given return period 
increase). This effect is due to progressive decrease of statistical accuracy, 
moving from 10 years to 1 year, in fitting the tails of the probability ^(h) 
for the daily rainfall amount to exceed a threshold h given h > 10 mm. As 
the length of the subset decreases larger fluctuations (with respect the value 
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relative to the entire set, see also panel (b) and (c) of Fig. (J2]) for the shape 
and scale parameter are observed. These fluctuations are approximatively 
symmetric, however their effects on the annual maxima are not. The right 
tail of zeta(y) is dominated by those fluctuations of the scale and shape 
parameters increasing the probability of observing larger (with respect to the 
entire set case) maxima. In turn, this implies a decrease for the cumulative 
distribution ((y). These last results might explain why, for the interval 1840- 
1920, the 2 years and 1 year long subsets estimate are respectively close to, 
and inside the 5%, 95% percentile range expected for a homogeneous process: 
the fluctuation due to the lack of statistics might be large enough to hide the 
inhomogeneity of the data. 

4.4 Predictions for structural stability 

In the previous Section we have shown that for the time intervals 1725-1764, 
1768-1807, 1841-1880, and 1887-2006 the daily amount of rainfall can be 
considered a homogeneous process: fixed scale and shape parameter. For 
these intervals, the probability ((y) for the yearly maximum daily amount 
to not exceed the threshold y can be calculated with the MEV formalism as 

1 T 1 T 

C(y) =t^ > n ^ c ii w o) = rYl £( h > n i> c > w ) 
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3=1 

exp(-exp(-^(^ - C w \n < n >) 

For the time interval 1840-1920 the daily amount of rain could not be con- 
sidered a homogeneous process. The analysis of the previous Section and 
Fig. H] suggest that, for this time interval, the MEV estimate calculated with 
5 years long subsets should be used. The results relative to the 1 year and 2 
year subsets are too noisy to be trustworthy. The results relative to 10 years 
long subsets are appreciably different from those relative to 5 years: the 5 
year long subset better resolve the inhomogeneity of the interval. 

Figure [5] depicts the Gumbel plot for the probability zeta(y) for the time 
intervals 1725-1764 (red curve), 1768-1807 (green curve), 1841-1880 (blue 
curve) 1887-2006 (pink curve), together with the 5 years long subset estimate 
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for the interval 1841-1920 (cyan curve). We first consider the 1,000 years 
return time intensity. These intesities are all estimates from MEV fits. The 
most conservative prediction is the one for the interval 1841-1880 (~110 mm), 
followed by the 1725-1764 (~130 mm). The estimates for the remaining 
intervals predict an intensity of ~ 160 mm. Overall there is a ~50 mm 
variability, not exactly a negligible one. In the cas of the 100 year return time 
intesity the varaibilty is ~30 mm (from 90 mm to 120 mm) which is also not 
negligible. Note that except for the 1887-2006 interval, which is 120 years 
long, all the 100 years return time intensites are estimates from the MEV fit. 
Overall, the variability observed is not connected to the procedure adopted 
to derive the predictions (preconditioning and metastatistics) which is the 
proper one, but to the fact that we sample different epochs with different 
climate "conditions". Even when we use the MEV estimates, we assume 
that the future will be the same as the present (the time interval used to 
make the prediction) but this is rarely the case. 

5 Conclusions 

The metastatistics approach described in this manuscript extend the extreme 
value theorem to statistical inhomogeneous cases. These are the most proba- 
ble cases occurring in nature. In particular, we have applied the metastatis- 
tics to the case of Weibull variates . In this case the metastatistic approach 
coupled with the practice of preconditioning offers the correct solution while 
the standard method (fitting the maxima with the generalized extreme value 
distribution) adopted in literature does not. The case of Weibull variates is 
of particular importance because the distribution of daily amount of rainfall 
is right tail equivalent to a Weibull distribution [TTJ. Thus the metastatistics 
approach together with the penultimate approxiamtion are the proper tools 
to address the important question of predicting the frequency of extreme 
hydrological events. We have done so using the Padova time series. Five 
different predictions have been derived: one for each time interval of time 
series considered. The variability observed for the intensity of the 1,000 (100) 
year return time event is of the order of 50 (30) mm: not a negligible one. 

These limitations reflects the fact that different climate condition have 
been adopted, and that one (as in all works in literature) consider the climate 
condition with the prediction is done to be valid also in the future. Using the 
1841-1880 time interval we have 110 mm daily amount with a return time of 
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1,000 years. But, using the 1887-2006 time interval we would predict 160 mm 
as the daily amount with occurring on average once in a millennium. How 
to bypass this limitations in the case of hydrological extreme? We need to 
connect the daily precipitation dynamics to the climate parameters which can 
more accurately be estimated by climate models. In practice the shape C and 
scale w parameter are dependent on some of the climatological parameters 
7^. Note that any dependece is likely to be stocahstic in nature rather 
than a deterministic one. Thus in theory we can use the dependance and 
the "proper" estimate of the future value of the climatological parameters to 
"estimate" what would be the future metastatistics factor /f u t(C, w) to use in 
the MEV formula, Eq. (11 6p . This will enable one to make a prediction of the 
frequency of extreme events which matches the "future" climate condition 
and not the current one. 
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Figure 2: The annual maxima M (solid line in top left panel), the number of 
wet days n w per year, (solid line in middle-top left panel), the scale parameter 
C (middle-bottom left panel), and the shape w parameter (bottom left panel) 
for the four intervals (indicated by the shadowed region) of the Padova time 
series. For the shape and scale parameter, we report the values obtained when 
using the distribution relative to the entire data in the interval (horizontal 
black line), the values obtained using 1 year windows (red solid line), and 
the values obtained using 10 year windows (solid blue line). Panels on the 
right report the observed frequencies for the variables displayed on the left 
panels. 
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Figure 3: Gumbel plot of the probability ((y) of exceeding the value y for a 
maxima for three different artificially generated sequences. Panels (a), (b), 
and (c) refer respectively to the experiment (1), (2), and (3). The solid lines 
indicate: the median of the preconditioned metastatistics estimate (blue), the 
median of the generalized extreme value estimate (red), the median of the 
Gumbel estimate (green), the median of the preconditioned metastatistics 
estimate obtained using average values of scale and shape parameter (pink). 
Black empty square denote the observed expected results (truth). 
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Figure 4: Gumbel plots of the probability ((y) of exceeding the value y. Panel 
(a): preconditioned metastatistics estimate for the time interval 1887-2006 
using the whole set (black line), 10 years long subsets (pink line), 5 years 
long subsets (blue line), 2 years long subsets (green line), and 1 year long 
subsets (red line). The areas shadowej^in gray indicate the spreading (5% to 
95% percentile), expected in the homogeneous case, while the black squares 
indicate the median. The curve relative to 5, 2, and 1 years subset are shifted 
for clarity. Panel (b): as panel (a) but only preconditioned metastatistics 
estimates with no shift. Panels (c) and (d): as panels (a) and (b) but for the 
1841-1920 time interval. 
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Figure 5: Gumbel plot of the probability £(y) of exceeding the value y for the 
time intervals 1725-1761, 1768-1807, 1841-1880, 1841-1920, and 1887-2006. 
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